Empirical Performance-Model Driven Data Layout Optimization
نویسندگان
چکیده
Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. Very often, simplified abstractions such as reuse distance are used instead of more accurate metrics like cache miss cost, in order to make modeldriven compiler optimization feasible. In this paper, we describe an approach, in which a class of computations is modeled in terms of its constituent operations that are empirically measured. The empirical measurement of the constituent operations allows modeling of the overall execution time of the computation. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. This may be viewed as an example of the telescoping language approach to optimizing a sequence of library calls using semantic information about the performance properties of the library routines. The effectiveness of the approach is demonstrated through experimental measurements on some representative computations from quantum chemistry.
منابع مشابه
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions
Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined in ATLAS by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing com...
متن کاملAutomatic Data Layout Using 0-1 Integer Programming
In the optimization world now, even the use of considering the number of processors for parallel programming is carefully considered. The goal of high-level languages is to provide a simple yet efficient machine-independent parallel programming model. The programmer’s data parallel programs should be able to compile and executed with a good performance on many different architectures. However t...
متن کاملGong, Zhenhuan. Multi-level Data Layout Optimization for Heterogeneous Access Patterns. (under the Direction of Dr. Nagiza F. Samatova.) Multi-level Data Layout Optimization for Heterogeneous Access Patterns
GONG, ZHENHUAN. Multi-level Data Layout Optimization for Heterogeneous Access Patterns. (Under the direction of Dr. Nagiza F. Samatova.) Recent years have seen an enormous increase in computation power of leadership computing facilities. As a result, huge amounts of data, from terascale to petascale, are being produced by scientific applications running on supercomputers. However, the I/O subsy...
متن کاملApplications of two new algorithms of cuckoo optimization (CO) and forest optimization (FO) for solving single row facility layout problem (SRFLP)
Nowadays, due to inherent complexity of real optimization problems, it has always been a challenging issue to develop a solution algorithm to these problems. Single row facility layout problem (SRFLP) is a NP-hard problem of arranging a number of rectangular facilities with varying length on one side of a straight line with aim of minimizing the weighted sum of the distance between all facility...
متن کاملA new layout-driven timing model for incremental layout optimization
In this paper we present a new layout-driven timing model based on Asymptotic Waveform Evaluation (AWE) for improved timing analysis during routing. Our model enables the bottom-up computation of interconnect tree moments, and can be easily integrated with such a global router. Such an integration achieves incremental layout optimization, i.e., timing analysis and routing are tightly coupled, w...
متن کامل